JIRA <-> IMPACT sync for Status and Priority#209
Conversation
…oManoTech/firefighter-incident into feat/impact_jira_status_sync
for more information, see https://pre-commit.ci
…oManoTech/firefighter-incident into feat/impact_jira_status_sync
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #209 +/- ##
==========================================
- Coverage 67.08% 66.93% -0.16%
==========================================
Files 212 212
Lines 10029 10319 +290
Branches 1099 1158 +59
==========================================
+ Hits 6728 6907 +179
- Misses 3030 3100 +70
- Partials 271 312 +41
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
NicolasLafitteMM
left a comment
There was a problem hiding this comment.
Code Review Summary
Thank you for this comprehensive PR! The bidirectional Jira sync is a great feature with excellent documentation and test coverage. However, I've identified a few critical issues that should be addressed before merging.
Critical Issues (Must Fix)
- Excessive WARNING logging - Normal operations logged as warnings will pollute production logs
- Missing null check - Potential AttributeError when accessing incident.id in loop prevention
- Error handling in MITIGATING transition - First transition failure prevents second attempt, leaving Jira in inconsistent state
Medium Priority Issues
- Code duplication -
_normalize_cache_value()duplicated in two files - Silent error handling - Returning True on failures masks problems
- N+1 query potential - JiraTicket lookup inside loop
I'll add inline comments on specific lines with detailed recommendations.
Code Review - Based on Actual Branch CodeI've now checked out the branch and reviewed the actual code. Here are my findings: ✅ What's Excellent
🔴 Critical Issues (Must Fix)I'll post these as separate comments with detailed recommendations. |
🔴 Critical #1: Excessive WARNING loggingFile: Every normal incident update is logged as WARNING, which will severely pollute production logs: logger.warning(
"incident_updated handler invoked for incident #%s with status %s; updated_fields=%s event_type=%s",
getattr(incident, "id", "unknown"),
incident_update.status,
updated_fields,
incident_update.event_type,
)Impact: This handler is called for EVERY incident status update. In production, this will generate thousands of WARNING logs daily for normal operations. Fix: logger.debug( # or logger.info at most
"incident_updated handler invoked for incident #%s with status %s; updated_fields=%s event_type=%s",
getattr(incident, "id", "unknown"),
incident_update.status,
updated_fields,
incident_update.event_type,
)Principle: WARNING should be reserved for actual problems, not normal operations. This is especially important for monitoring and alerting systems. |
🔴 Critical #2: Error handling in MITIGATING double transitionFile: When transitioning to MITIGATING status, if the first transition ("Pending resolution") fails, the if incident_update.status == IncidentStatus.MITIGATING:
for step in ("Pending resolution", "in progress"):
try:
client.transition_issue_auto(
incident.jira_ticket.id, step, RAID_JIRA_WORKFLOW_NAME
)
except Exception:
logger.exception(...)
return # ❌ Gives up completely - second transition never attemptedProblem Scenarios:
Recommendation:
if incident_update.status == IncidentStatus.MITIGATING:
errors = []
for step in ("Pending resolution", "in progress"):
try:
logger.debug(...)
client.transition_issue_auto(
incident.jira_ticket.id, step, RAID_JIRA_WORKFLOW_NAME
)
except Exception as e:
logger.exception(...)
errors.append((step, e))
# Don't return - continue to next step
if not errors:
logger.info("Successfully transitioned through both steps")
elif len(errors) == 2:
logger.error("Both transitions failed")
else:
logger.warning("Partial transition completed")
return |
|
|
💡 Suggestion #5: Potential N+1 query in webhook processingFile: The def create(self, validated_data: dict[str, Any]) -> bool:
changes = validated_data.get("changelog", {}).get("items") or []
for change_item in changes: # ← Loop over changes
self._sync_jira_fields_to_incident(...) # ← Calls _get_incident_from_jira_ticket
def _handle_status_update(...):
incident = self._get_incident_from_jira_ticket(jira_ticket_key) # ← DB query
def _handle_priority_update(...):
incident = self._get_incident_from_jira_ticket(jira_ticket_key) # ← DB queryImpact:
Optimization: def create(self, validated_data: dict[str, Any]) -> bool:
changes = validated_data.get("changelog", {}).get("items") or []
if not changes:
return True
jira_ticket_key = validated_data["issue"].get("key")
# Fetch JiraTicket + Incident once, outside the loop
incident = self._get_incident_from_jira_ticket(jira_ticket_key)
if incident is None:
return True # No incident linked, skip all changes
for change_item in changes:
self._sync_jira_fields_to_incident(
validated_data, incident, change_item # Pass pre-fetched incident
)
return TrueNote: This is a minor optimization. In practice, webhooks rarely have multiple relevant changes, so the impact is small. Consider this a "nice to have" rather than critical. |
💡 Suggestion #6: Cache timeout configurationFile: The 30-second cache timeout is hardcoded: def _set_impact_to_jira_cache(
incident_id: Any, field: str, value: Any, timeout: int = 30
) -> None:Concern: Recommendation: from django.conf import settings
# Default to 60s for safer margin
JIRA_SYNC_CACHE_TIMEOUT = getattr(settings, 'JIRA_SYNC_CACHE_TIMEOUT', 60)
def _set_impact_to_jira_cache(
incident_id: Any,
field: str,
value: Any,
timeout: int = JIRA_SYNC_CACHE_TIMEOUT
) -> None:
cache_key = (
f"sync:impact_to_jira:{incident_id}:{field}:{_normalize_cache_value(value)}"
)
cache.set(cache_key, value=True, timeout=timeout)Benefits:
Alternative: Just increase the default to 60 seconds if configurability isn't needed. |
💡 Code Quality #7: Magic strings for Jira statusesFile: Jira status strings are hardcoded throughout: for step in ("Pending resolution", "in progress"): # Line 99
...
target_jira_status = "Closed" # Line 140Recommendation: # Module-level constants
JIRA_STATUS_INCOMING = "Incoming"
JIRA_STATUS_PENDING_RESOLUTION = "Pending resolution"
JIRA_STATUS_IN_PROGRESS = "in progress"
JIRA_STATUS_REPORTER_VALIDATION = "Reporter validation"
JIRA_STATUS_CLOSED = "Closed"
IMPACT_TO_JIRA_STATUS_MAP: dict[IncidentStatus, str] = {
IncidentStatus.OPEN: JIRA_STATUS_INCOMING,
IncidentStatus.INVESTIGATING: JIRA_STATUS_IN_PROGRESS,
IncidentStatus.MITIGATING: JIRA_STATUS_IN_PROGRESS,
IncidentStatus.MITIGATED: JIRA_STATUS_REPORTER_VALIDATION,
IncidentStatus.POST_MORTEM: JIRA_STATUS_REPORTER_VALIDATION,
IncidentStatus.CLOSED: JIRA_STATUS_CLOSED,
}
# Then use:
for step in (JIRA_STATUS_PENDING_RESOLUTION, JIRA_STATUS_IN_PROGRESS):
...
target_jira_status = JIRA_STATUS_CLOSEDBenefits:
|
❓ Questions for the AuthorI have a few important questions: 1. Empty string in
|
🎯 Final Verdict & SummaryGreat work on this PR! The bidirectional sync is a valuable feature with excellent documentation. ✅ Strengths
🔴 Blocking Issues (2)Must be fixed before merge:
|
In this case that return True do not mean we are synced, it means we handle the signal in the expected way. Comment added on code for clarification. |
Added config. |
for more information, see https://pre-commit.ci
…oManoTech/firefighter-incident into feat/impact_jira_status_sync
https://manomano.atlassian.net/browse/GT-1773